In [1]:
pip install plotly
Requirement already satisfied: plotly in c:\users\shahd\anaconda3\lib\site-packages (4.7.1)
Requirement already satisfied: six in c:\users\shahd\anaconda3\lib\site-packages (from plotly) (1.12.0)
Requirement already satisfied: retrying>=1.3.3 in c:\users\shahd\anaconda3\lib\site-packages (from plotly) (1.3.3)
Note: you may need to restart the kernel to use updated packages.

Import required packages

The dataset was created by IBM employees and was downloaded from Kaggle. The dataset is fictional and that data does not actually represent any actual IBM employees.

Attrition: It is basically the turnover rate of employees inside an organization.

This can happen for many reasons:

Employees looking for better opportunities. A negative working environment. Bad management Sickness of an employee (or even death) Excessive working hours

The objective is

It starts from framing business question t

1. Import required packages

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

2. Data Extracting

load the dataset and have clear understanding of the dataset attributes

In [3]:
#reading CSV file
df = pd.read_csv('emp_attrition.csv')
In [4]:
#getting the first rows
df.head(10)
Out[4]:
Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeCount EmployeeNumber ... RelationshipSatisfaction StandardHours StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
0 41 Yes Travel_Rarely 1102 Sales 1 2 Life Sciences 1 1 ... 1 80 0 8 0 1 6 4 0 5
1 49 No Travel_Frequently 279 Research & Development 8 1 Life Sciences 1 2 ... 4 80 1 10 3 3 10 7 1 7
2 37 Yes Travel_Rarely 1373 Research & Development 2 2 Other 1 4 ... 2 80 0 7 3 3 0 0 0 0
3 33 No Travel_Frequently 1392 Research & Development 3 4 Life Sciences 1 5 ... 3 80 0 8 3 3 8 7 3 0
4 27 No Travel_Rarely 591 Research & Development 2 1 Medical 1 7 ... 4 80 1 6 3 3 2 2 2 2
5 32 No Travel_Frequently 1005 Research & Development 2 2 Life Sciences 1 8 ... 3 80 0 8 2 2 7 7 3 6
6 59 No Travel_Rarely 1324 Research & Development 3 3 Medical 1 10 ... 1 80 3 12 3 2 1 0 0 0
7 30 No Travel_Rarely 1358 Research & Development 24 1 Life Sciences 1 11 ... 2 80 1 1 2 3 1 0 0 0
8 38 No Travel_Frequently 216 Research & Development 23 3 Life Sciences 1 12 ... 2 80 0 10 2 3 9 7 1 8
9 36 No Travel_Rarely 1299 Research & Development 27 3 Medical 1 13 ... 2 80 2 17 3 2 7 7 7 7

10 rows × 35 columns

In [5]:
#explore the sape of the dataset
print('Rows x Columns : ', df.shape[0], 'x', df.shape[1])
Rows x Columns :  1470 x 35
In [6]:
#read all coulmns names
print('Features: \n', df.columns.tolist())
Features: 
 ['Age', 'Attrition', 'BusinessTravel', 'DailyRate', 'Department', 'DistanceFromHome', 'Education', 'EducationField', 'EmployeeCount', 'EmployeeNumber', 'EnvironmentSatisfaction', 'Gender', 'HourlyRate', 'JobInvolvement', 'JobLevel', 'JobRole', 'JobSatisfaction', 'MaritalStatus', 'MonthlyIncome', 'MonthlyRate', 'NumCompaniesWorked', 'Over18', 'OverTime', 'PercentSalaryHike', 'PerformanceRating', 'RelationshipSatisfaction', 'StandardHours', 'StockOptionLevel', 'TotalWorkingYears', 'TrainingTimesLastYear', 'WorkLifeBalance', 'YearsAtCompany', 'YearsInCurrentRole', 'YearsSinceLastPromotion', 'YearsWithCurrManager']
In [7]:
#having a description of the dataset
print(df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1470 entries, 0 to 1469
Data columns (total 35 columns):
Age                         1470 non-null int64
Attrition                   1470 non-null object
BusinessTravel              1470 non-null object
DailyRate                   1470 non-null int64
Department                  1470 non-null object
DistanceFromHome            1470 non-null int64
Education                   1470 non-null int64
EducationField              1470 non-null object
EmployeeCount               1470 non-null int64
EmployeeNumber              1470 non-null int64
EnvironmentSatisfaction     1470 non-null int64
Gender                      1470 non-null object
HourlyRate                  1470 non-null int64
JobInvolvement              1470 non-null int64
JobLevel                    1470 non-null int64
JobRole                     1470 non-null object
JobSatisfaction             1470 non-null int64
MaritalStatus               1470 non-null object
MonthlyIncome               1470 non-null int64
MonthlyRate                 1470 non-null int64
NumCompaniesWorked          1470 non-null int64
Over18                      1470 non-null object
OverTime                    1470 non-null object
PercentSalaryHike           1470 non-null int64
PerformanceRating           1470 non-null int64
RelationshipSatisfaction    1470 non-null int64
StandardHours               1470 non-null int64
StockOptionLevel            1470 non-null int64
TotalWorkingYears           1470 non-null int64
TrainingTimesLastYear       1470 non-null int64
WorkLifeBalance             1470 non-null int64
YearsAtCompany              1470 non-null int64
YearsInCurrentRole          1470 non-null int64
YearsSinceLastPromotion     1470 non-null int64
YearsWithCurrManager        1470 non-null int64
dtypes: int64(26), object(9)
memory usage: 402.1+ KB
None
In [8]:
#getting the unique values
print('\nUnique values:')
print(df.nunique())
for col in df.columns:
    print(col, ':', sorted(df[col].unique()))
Unique values:
Age                           43
Attrition                      2
BusinessTravel                 3
DailyRate                    886
Department                     3
DistanceFromHome              29
Education                      5
EducationField                 6
EmployeeCount                  1
EmployeeNumber              1470
EnvironmentSatisfaction        4
Gender                         2
HourlyRate                    71
JobInvolvement                 4
JobLevel                       5
JobRole                        9
JobSatisfaction                4
MaritalStatus                  3
MonthlyIncome               1349
MonthlyRate                 1427
NumCompaniesWorked            10
Over18                         1
OverTime                       2
PercentSalaryHike             15
PerformanceRating              2
RelationshipSatisfaction       4
StandardHours                  1
StockOptionLevel               4
TotalWorkingYears             40
TrainingTimesLastYear          7
WorkLifeBalance                4
YearsAtCompany                37
YearsInCurrentRole            19
YearsSinceLastPromotion       16
YearsWithCurrManager          18
dtype: int64
Age : [18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60]
Attrition : ['No', 'Yes']
BusinessTravel : ['Non-Travel', 'Travel_Frequently', 'Travel_Rarely']
DailyRate : [102, 103, 104, 105, 106, 107, 109, 111, 115, 116, 117, 118, 119, 120, 121, 124, 125, 128, 129, 130, 131, 132, 134, 135, 136, 138, 140, 141, 142, 143, 144, 145, 146, 147, 148, 150, 152, 153, 154, 155, 156, 157, 160, 161, 163, 164, 167, 168, 170, 172, 174, 176, 177, 179, 180, 181, 182, 185, 188, 189, 192, 193, 194, 195, 196, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 210, 211, 213, 216, 217, 218, 219, 224, 228, 230, 231, 232, 234, 237, 238, 240, 241, 243, 244, 247, 248, 249, 251, 252, 253, 254, 256, 258, 261, 263, 264, 265, 266, 267, 268, 269, 271, 277, 279, 280, 282, 285, 286, 287, 288, 289, 290, 294, 296, 299, 300, 301, 302, 303, 304, 305, 306, 307, 309, 310, 311, 313, 314, 316, 317, 318, 319, 322, 325, 326, 328, 329, 330, 332, 333, 334, 335, 336, 337, 341, 342, 343, 345, 346, 350, 352, 355, 359, 360, 362, 363, 364, 365, 367, 369, 370, 371, 373, 374, 376, 377, 379, 381, 383, 384, 390, 391, 392, 394, 395, 397, 401, 404, 405, 406, 408, 410, 412, 413, 414, 415, 419, 422, 426, 427, 428, 429, 430, 431, 432, 433, 436, 437, 438, 439, 440, 441, 442, 443, 444, 446, 447, 448, 452, 457, 458, 459, 461, 462, 464, 465, 466, 467, 468, 469, 470, 471, 472, 474, 477, 478, 479, 480, 481, 482, 483, 486, 488, 489, 490, 492, 495, 496, 498, 499, 500, 501, 504, 505, 506, 507, 508, 509, 511, 515, 516, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 534, 535, 536, 538, 541, 542, 543, 544, 546, 548, 549, 552, 553, 555, 556, 557, 558, 559, 560, 561, 562, 563, 566, 567, 570, 571, 572, 573, 574, 575, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 598, 599, 600, 601, 602, 603, 604, 605, 607, 608, 611, 613, 616, 617, 618, 619, 621, 622, 625, 626, 628, 629, 630, 632, 634, 635, 636, 638, 640, 641, 642, 644, 645, 646, 647, 648, 650, 652, 653, 654, 655, 657, 658, 660, 661, 662, 663, 664, 665, 666, 667, 669, 670, 671, 672, 673, 674, 675, 676, 678, 682, 683, 684, 685, 688, 689, 691, 692, 693, 694, 696, 697, 699, 702, 703, 704, 705, 706, 708, 711, 713, 715, 717, 718, 719, 720, 721, 722, 723, 725, 726, 727, 728, 734, 735, 736, 737, 738, 740, 741, 746, 748, 750, 754, 755, 756, 759, 760, 763, 766, 769, 770, 771, 772, 773, 775, 776, 781, 782, 783, 786, 788, 790, 791, 792, 793, 796, 797, 798, 799, 801, 802, 804, 805, 806, 807, 809, 810, 812, 813, 817, 818, 819, 821, 823, 824, 827, 828, 829, 830, 832, 833, 835, 836, 840, 841, 845, 848, 849, 852, 853, 854, 855, 857, 859, 862, 866, 867, 868, 869, 871, 874, 878, 879, 880, 881, 882, 883, 884, 885, 887, 888, 889, 890, 891, 894, 895, 896, 897, 898, 902, 903, 905, 906, 911, 913, 916, 917, 918, 920, 921, 922, 924, 926, 928, 929, 930, 931, 932, 933, 935, 937, 938, 939, 942, 943, 944, 945, 946, 949, 950, 953, 954, 955, 956, 957, 959, 964, 966, 967, 968, 969, 970, 971, 974, 975, 976, 977, 979, 981, 982, 983, 984, 986, 988, 989, 990, 991, 992, 994, 995, 996, 999, 1001, 1002, 1003, 1005, 1009, 1010, 1012, 1015, 1017, 1018, 1023, 1029, 1030, 1031, 1033, 1034, 1035, 1037, 1038, 1040, 1041, 1045, 1046, 1050, 1052, 1053, 1054, 1055, 1059, 1060, 1062, 1063, 1064, 1065, 1066, 1069, 1070, 1075, 1076, 1079, 1082, 1083, 1084, 1085, 1086, 1089, 1090, 1091, 1092, 1093, 1094, 1096, 1097, 1098, 1099, 1102, 1103, 1105, 1107, 1108, 1109, 1111, 1112, 1115, 1116, 1117, 1120, 1122, 1123, 1124, 1125, 1126, 1127, 1128, 1130, 1131, 1132, 1134, 1136, 1137, 1138, 1141, 1142, 1144, 1145, 1146, 1147, 1150, 1151, 1153, 1154, 1157, 1158, 1162, 1167, 1168, 1169, 1171, 1172, 1174, 1176, 1178, 1179, 1180, 1181, 1182, 1184, 1186, 1188, 1189, 1192, 1193, 1194, 1195, 1198, 1199, 1200, 1202, 1204, 1206, 1207, 1210, 1211, 1212, 1213, 1214, 1216, 1217, 1218, 1219, 1220, 1221, 1222, 1223, 1224, 1225, 1229, 1230, 1231, 1232, 1234, 1236, 1238, 1239, 1240, 1242, 1243, 1245, 1246, 1247, 1249, 1251, 1252, 1253, 1254, 1255, 1256, 1258, 1259, 1261, 1262, 1265, 1266, 1268, 1269, 1271, 1272, 1273, 1274, 1275, 1276, 1277, 1278, 1280, 1282, 1283, 1287, 1288, 1291, 1294, 1296, 1297, 1299, 1300, 1302, 1303, 1305, 1306, 1308, 1309, 1311, 1312, 1313, 1315, 1316, 1318, 1319, 1320, 1321, 1322, 1323, 1324, 1325, 1326, 1327, 1328, 1329, 1330, 1332, 1333, 1334, 1336, 1337, 1339, 1342, 1343, 1344, 1346, 1349, 1351, 1353, 1354, 1355, 1356, 1357, 1358, 1360, 1361, 1362, 1365, 1366, 1368, 1369, 1370, 1371, 1372, 1373, 1375, 1376, 1377, 1378, 1379, 1380, 1381, 1382, 1383, 1384, 1385, 1387, 1389, 1391, 1392, 1394, 1395, 1396, 1397, 1398, 1400, 1401, 1402, 1403, 1404, 1405, 1410, 1411, 1413, 1416, 1418, 1420, 1421, 1422, 1423, 1425, 1427, 1429, 1431, 1434, 1435, 1436, 1439, 1440, 1441, 1442, 1443, 1444, 1445, 1448, 1449, 1450, 1451, 1452, 1454, 1456, 1457, 1459, 1462, 1463, 1464, 1465, 1467, 1469, 1470, 1473, 1474, 1475, 1476, 1479, 1480, 1482, 1485, 1488, 1490, 1492, 1495, 1496, 1498, 1499]
Department : ['Human Resources', 'Research & Development', 'Sales']
DistanceFromHome : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
Education : [1, 2, 3, 4, 5]
EducationField : ['Human Resources', 'Life Sciences', 'Marketing', 'Medical', 'Other', 'Technical Degree']
EmployeeCount : [1]
EmployeeNumber : [1, 2, 4, 5, 7, 8, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 35, 36, 38, 39, 40, 41, 42, 45, 46, 47, 49, 51, 52, 53, 54, 55, 56, 57, 58, 60, 61, 62, 63, 64, 65, 68, 70, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 83, 84, 85, 86, 88, 90, 91, 94, 95, 96, 97, 98, 100, 101, 102, 103, 104, 105, 106, 107, 110, 112, 113, 116, 117, 118, 119, 120, 121, 124, 125, 126, 128, 129, 131, 132, 133, 134, 137, 138, 139, 140, 141, 142, 143, 144, 145, 147, 148, 150, 151, 152, 153, 154, 155, 158, 159, 160, 161, 162, 163, 164, 165, 167, 169, 170, 171, 174, 175, 176, 177, 178, 179, 182, 183, 184, 190, 192, 193, 194, 195, 197, 198, 199, 200, 201, 202, 204, 205, 206, 207, 208, 211, 214, 215, 216, 217, 218, 221, 223, 224, 226, 227, 228, 230, 231, 233, 235, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 252, 253, 254, 256, 258, 259, 260, 261, 262, 264, 267, 269, 270, 271, 273, 274, 275, 277, 281, 282, 283, 284, 286, 287, 288, 291, 292, 293, 296, 297, 298, 299, 300, 302, 303, 304, 305, 306, 307, 308, 309, 311, 312, 314, 315, 316, 319, 321, 323, 325, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 346, 347, 349, 350, 351, 352, 353, 355, 359, 361, 362, 363, 364, 366, 367, 369, 372, 373, 374, 376, 377, 378, 379, 380, 381, 382, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 399, 401, 403, 404, 405, 406, 407, 408, 410, 411, 412, 416, 417, 419, 420, 421, 422, 423, 424, 425, 426, 428, 429, 430, 431, 433, 434, 436, 437, 438, 439, 440, 441, 442, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 458, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 473, 474, 475, 476, 477, 478, 479, 481, 482, 483, 484, 485, 486, 487, 488, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 505, 507, 508, 510, 511, 513, 514, 515, 516, 517, 518, 520, 521, 522, 523, 524, 525, 526, 527, 529, 530, 531, 532, 533, 534, 536, 538, 543, 544, 546, 547, 548, 549, 550, 551, 554, 555, 556, 558, 560, 562, 564, 565, 566, 567, 568, 569, 571, 573, 574, 575, 577, 578, 579, 580, 581, 582, 584, 585, 586, 587, 590, 591, 592, 593, 595, 597, 599, 600, 601, 602, 604, 605, 606, 608, 611, 612, 613, 614, 615, 616, 618, 620, 621, 622, 623, 624, 625, 626, 630, 631, 632, 634, 635, 638, 639, 641, 643, 644, 645, 647, 648, 649, 650, 652, 653, 655, 656, 657, 659, 661, 662, 663, 664, 665, 666, 667, 669, 671, 675, 677, 679, 680, 682, 683, 684, 686, 689, 690, 691, 692, 698, 699, 700, 701, 702, 704, 705, 707, 709, 710, 712, 714, 715, 716, 717, 720, 721, 722, 723, 724, 725, 727, 728, 729, 730, 731, 732, 733, 734, 738, 741, 742, 743, 744, 746, 747, 749, 752, 754, 757, 758, 760, 762, 763, 764, 766, 769, 771, 772, 773, 775, 776, 780, 781, 783, 784, 785, 786, 787, 789, 791, 792, 793, 796, 797, 799, 800, 802, 803, 804, 805, 806, 807, 808, 809, 811, 812, 813, 815, 816, 817, 819, 820, 823, 824, 825, 826, 827, 828, 829, 830, 832, 833, 834, 836, 837, 838, 840, 842, 843, 844, 845, 846, 847, 848, 850, 851, 852, 854, 855, 856, 857, 859, 861, 862, 864, 865, 867, 868, 869, 872, 874, 875, 878, 879, 880, 881, 882, 885, 887, 888, 889, 893, 894, 895, 896, 897, 899, 900, 901, 902, 903, 904, 905, 909, 910, 911, 912, 913, 916, 918, 920, 922, 923, 924, 925, 926, 927, 930, 932, 933, 934, 936, 939, 940, 941, 942, 944, 945, 947, 949, 950, 951, 952, 954, 956, 957, 958, 959, 960, 961, 964, 966, 967, 969, 970, 972, 974, 975, 976, 977, 981, 982, 983, 984, 985, 986, 987, 990, 991, 992, 994, 995, 996, 997, 998, 999, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1009, 1010, 1011, 1012, 1013, 1014, 1015, 1016, 1017, 1018, 1019, 1022, 1024, 1025, 1026, 1027, 1028, 1029, 1030, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039, 1040, 1042, 1043, 1044, 1045, 1046, 1047, 1048, 1049, 1050, 1052, 1053, 1055, 1056, 1060, 1061, 1062, 1066, 1068, 1069, 1070, 1071, 1073, 1074, 1076, 1077, 1079, 1080, 1081, 1082, 1083, 1084, 1085, 1088, 1092, 1094, 1096, 1097, 1098, 1099, 1100, 1101, 1102, 1103, 1105, 1106, 1107, 1108, 1109, 1111, 1113, 1114, 1115, 1116, 1117, 1118, 1119, 1120, 1121, 1124, 1125, 1126, 1127, 1128, 1131, 1132, 1133, 1135, 1136, 1137, 1138, 1140, 1143, 1148, 1150, 1152, 1154, 1156, 1157, 1158, 1160, 1161, 1162, 1163, 1164, 1165, 1166, 1167, 1171, 1172, 1173, 1175, 1177, 1179, 1180, 1182, 1184, 1185, 1188, 1190, 1191, 1192, 1193, 1195, 1196, 1198, 1200, 1201, 1202, 1203, 1204, 1206, 1207, 1210, 1211, 1212, 1215, 1216, 1217, 1218, 1219, 1220, 1221, 1224, 1225, 1226, 1228, 1231, 1233, 1234, 1235, 1237, 1238, 1239, 1240, 1241, 1242, 1243, 1244, 1245, 1246, 1248, 1249, 1250, 1251, 1252, 1254, 1255, 1256, 1257, 1258, 1259, 1260, 1263, 1264, 1265, 1267, 1268, 1269, 1270, 1273, 1275, 1277, 1278, 1279, 1280, 1281, 1282, 1283, 1285, 1286, 1288, 1289, 1291, 1292, 1293, 1294, 1295, 1296, 1297, 1298, 1299, 1301, 1303, 1304, 1306, 1307, 1308, 1309, 1310, 1311, 1312, 1314, 1315, 1317, 1318, 1319, 1321, 1322, 1324, 1329, 1331, 1333, 1334, 1336, 1338, 1340, 1344, 1346, 1349, 1350, 1352, 1355, 1356, 1358, 1360, 1361, 1362, 1363, 1364, 1367, 1368, 1369, 1371, 1372, 1373, 1374, 1375, 1377, 1379, 1380, 1382, 1383, 1387, 1389, 1390, 1391, 1392, 1394, 1395, 1396, 1397, 1399, 1401, 1402, 1403, 1405, 1407, 1408, 1409, 1411, 1412, 1415, 1417, 1419, 1420, 1421, 1422, 1423, 1424, 1425, 1427, 1428, 1430, 1431, 1433, 1434, 1435, 1436, 1438, 1439, 1440, 1441, 1443, 1445, 1446, 1447, 1448, 1449, 1453, 1457, 1458, 1459, 1460, 1461, 1464, 1465, 1466, 1467, 1468, 1469, 1471, 1472, 1473, 1474, 1475, 1477, 1478, 1479, 1480, 1481, 1482, 1483, 1484, 1485, 1486, 1487, 1489, 1492, 1494, 1495, 1496, 1497, 1499, 1501, 1502, 1503, 1504, 1506, 1507, 1509, 1513, 1514, 1515, 1516, 1520, 1522, 1523, 1525, 1527, 1529, 1533, 1534, 1535, 1537, 1539, 1541, 1542, 1543, 1544, 1545, 1546, 1547, 1548, 1549, 1550, 1551, 1552, 1553, 1554, 1555, 1556, 1557, 1558, 1560, 1562, 1563, 1564, 1568, 1569, 1572, 1573, 1574, 1576, 1577, 1578, 1580, 1581, 1582, 1583, 1585, 1586, 1587, 1588, 1590, 1591, 1592, 1594, 1595, 1596, 1597, 1598, 1599, 1601, 1602, 1604, 1605, 1606, 1607, 1608, 1609, 1611, 1612, 1613, 1614, 1615, 1617, 1618, 1619, 1621, 1622, 1623, 1624, 1625, 1627, 1628, 1630, 1631, 1633, 1635, 1638, 1639, 1640, 1641, 1642, 1644, 1645, 1646, 1647, 1648, 1649, 1650, 1651, 1653, 1654, 1655, 1656, 1657, 1658, 1659, 1661, 1662, 1664, 1665, 1666, 1667, 1668, 1669, 1670, 1671, 1673, 1674, 1675, 1676, 1677, 1678, 1680, 1681, 1682, 1683, 1684, 1687, 1689, 1691, 1692, 1693, 1694, 1696, 1697, 1698, 1700, 1701, 1702, 1703, 1704, 1706, 1707, 1708, 1709, 1710, 1712, 1714, 1716, 1718, 1719, 1720, 1721, 1722, 1724, 1725, 1727, 1728, 1729, 1731, 1732, 1733, 1734, 1735, 1736, 1737, 1739, 1740, 1744, 1745, 1746, 1747, 1749, 1751, 1752, 1753, 1754, 1755, 1756, 1757, 1758, 1760, 1761, 1762, 1763, 1764, 1766, 1767, 1768, 1770, 1771, 1772, 1774, 1775, 1778, 1779, 1780, 1782, 1783, 1784, 1786, 1787, 1789, 1790, 1792, 1794, 1797, 1798, 1799, 1800, 1801, 1802, 1803, 1804, 1805, 1807, 1809, 1812, 1813, 1814, 1815, 1816, 1818, 1821, 1822, 1823, 1824, 1826, 1827, 1829, 1830, 1833, 1834, 1835, 1836, 1837, 1839, 1842, 1844, 1845, 1847, 1849, 1850, 1852, 1853, 1854, 1856, 1857, 1858, 1859, 1860, 1862, 1863, 1864, 1865, 1866, 1867, 1868, 1869, 1870, 1871, 1873, 1875, 1876, 1878, 1880, 1881, 1882, 1883, 1885, 1886, 1888, 1890, 1892, 1893, 1898, 1900, 1903, 1905, 1907, 1908, 1909, 1911, 1912, 1915, 1916, 1918, 1922, 1924, 1927, 1928, 1929, 1931, 1932, 1933, 1934, 1935, 1936, 1937, 1938, 1939, 1940, 1941, 1943, 1944, 1945, 1947, 1948, 1949, 1950, 1951, 1952, 1954, 1955, 1956, 1960, 1961, 1962, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1979, 1980, 1981, 1982, 1985, 1986, 1987, 1989, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2003, 2007, 2008, 2009, 2010, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025, 2026, 2027, 2031, 2032, 2034, 2035, 2036, 2037, 2038, 2040, 2041, 2044, 2045, 2046, 2048, 2049, 2051, 2052, 2053, 2054, 2055, 2056, 2057, 2060, 2061, 2062, 2064, 2065, 2068]
EnvironmentSatisfaction : [1, 2, 3, 4]
Gender : ['Female', 'Male']
HourlyRate : [30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]
JobInvolvement : [1, 2, 3, 4]
JobLevel : [1, 2, 3, 4, 5]
JobRole : ['Healthcare Representative', 'Human Resources', 'Laboratory Technician', 'Manager', 'Manufacturing Director', 'Research Director', 'Research Scientist', 'Sales Executive', 'Sales Representative']
JobSatisfaction : [1, 2, 3, 4]
MaritalStatus : ['Divorced', 'Married', 'Single']
MonthlyIncome : [1009, 1051, 1052, 1081, 1091, 1102, 1118, 1129, 1200, 1223, 1232, 1261, 1274, 1281, 1359, 1393, 1416, 1420, 1483, 1514, 1555, 1563, 1569, 1601, 1611, 1675, 1702, 1706, 1790, 1859, 1878, 1904, 1951, 2001, 2007, 2008, 2011, 2013, 2014, 2018, 2022, 2024, 2028, 2029, 2033, 2042, 2044, 2045, 2058, 2061, 2062, 2064, 2066, 2070, 2073, 2074, 2075, 2080, 2083, 2086, 2088, 2089, 2090, 2093, 2096, 2097, 2099, 2105, 2107, 2109, 2115, 2119, 2121, 2127, 2132, 2133, 2141, 2143, 2144, 2145, 2148, 2153, 2154, 2157, 2166, 2168, 2174, 2176, 2177, 2180, 2187, 2194, 2201, 2206, 2207, 2210, 2213, 2216, 2218, 2220, 2226, 2231, 2232, 2235, 2238, 2244, 2258, 2259, 2267, 2269, 2270, 2272, 2274, 2275, 2277, 2279, 2285, 2288, 2289, 2290, 2293, 2296, 2297, 2302, 2305, 2306, 2307, 2308, 2311, 2313, 2314, 2318, 2319, 2321, 2322, 2323, 2325, 2326, 2328, 2329, 2332, 2335, 2339, 2340, 2341, 2342, 2345, 2348, 2351, 2356, 2362, 2366, 2367, 2368, 2370, 2372, 2373, 2376, 2377, 2379, 2380, 2387, 2389, 2394, 2398, 2400, 2404, 2406, 2408, 2413, 2422, 2426, 2430, 2432, 2436, 2437, 2438, 2439, 2440, 2450, 2451, 2455, 2461, 2468, 2472, 2476, 2478, 2479, 2496, 2500, 2501, 2506, 2514, 2515, 2517, 2519, 2523, 2532, 2534, 2539, 2543, 2544, 2546, 2552, 2553, 2559, 2561, 2564, 2566, 2570, 2571, 2572, 2576, 2579, 2580, 2585, 2587, 2592, 2593, 2596, 2600, 2610, 2613, 2619, 2622, 2625, 2632, 2642, 2644, 2645, 2647, 2654, 2655, 2657, 2659, 2660, 2661, 2662, 2670, 2678, 2679, 2683, 2684, 2686, 2690, 2691, 2693, 2694, 2695, 2696, 2700, 2703, 2705, 2706, 2707, 2713, 2716, 2718, 2720, 2723, 2725, 2728, 2741, 2742, 2743, 2756, 2759, 2760, 2766, 2768, 2773, 2774, 2778, 2781, 2782, 2783, 2785, 2789, 2791, 2793, 2794, 2795, 2799, 2800, 2804, 2809, 2810, 2811, 2814, 2818, 2819, 2821, 2827, 2835, 2836, 2837, 2838, 2844, 2851, 2853, 2856, 2858, 2859, 2862, 2863, 2867, 2871, 2875, 2886, 2889, 2897, 2899, 2904, 2909, 2911, 2926, 2929, 2932, 2933, 2935, 2936, 2942, 2950, 2956, 2960, 2966, 2972, 2973, 2974, 2976, 2977, 2979, 2983, 2991, 2994, 2996, 3022, 3033, 3034, 3038, 3041, 3055, 3057, 3058, 3065, 3067, 3068, 3069, 3072, 3102, 3117, 3131, 3140, 3143, 3149, 3161, 3162, 3172, 3180, 3195, 3196, 3201, 3202, 3204, 3210, 3211, 3212, 3221, 3229, 3230, 3280, 3291, 3294, 3295, 3298, 3306, 3310, 3312, 3319, 3339, 3346, 3348, 3375, 3376, 3377, 3388, 3407, 3408, 3419, 3420, 3423, 3424, 3433, 3441, 3445, 3447, 3448, 3452, 3464, 3468, 3477, 3479, 3482, 3485, 3491, 3500, 3505, 3506, 3517, 3537, 3539, 3540, 3544, 3564, 3578, 3579, 3580, 3591, 3597, 3600, 3617, 3622, 3629, 3633, 3646, 3660, 3669, 3673, 3681, 3688, 3690, 3691, 3692, 3697, 3702, 3708, 3722, 3730, 3737, 3743, 3748, 3755, 3760, 3761, 3780, 3785, 3812, 3815, 3816, 3833, 3838, 3867, 3875, 3886, 3894, 3902, 3904, 3907, 3917, 3919, 3920, 3929, 3931, 3936, 3944, 3955, 3968, 3975, 3977, 3978, 3983, 3986, 3989, 4000, 4001, 4011, 4014, 4025, 4028, 4031, 4033, 4035, 4037, 4051, 4066, 4069, 4071, 4078, 4081, 4084, 4087, 4089, 4103, 4105, 4107, 4108, 4115, 4127, 4148, 4152, 4157, 4162, 4163, 4171, 4187, 4189, 4193, 4194, 4197, 4198, 4200, 4213, 4221, 4227, 4230, 4233, 4240, 4244, 4249, 4256, 4257, 4258, 4260, 4262, 4272, 4284, 4285, 4286, 4294, 4298, 4302, 4306, 4312, 4317, 4319, 4320, 4323, 4325, 4327, 4332, 4335, 4342, 4345, 4364, 4373, 4374, 4377, 4381, 4382, 4385, 4393, 4400, 4401, 4403, 4404, 4420, 4422, 4424, 4425, 4434, 4440, 4444, 4447, 4448, 4449, 4450, 4465, 4477, 4478, 4485, 4487, 4490, 4502, 4505, 4507, 4508, 4522, 4523, 4534, 4537, 4538, 4539, 4541, 4553, 4554, 4556, 4558, 4559, 4568, 4577, 4581, 4591, 4599, 4601, 4614, 4615, 4617, 4621, 4627, 4639, 4647, 4648, 4649, 4661, 4663, 4668, 4678, 4680, 4682, 4684, 4695, 4707, 4717, 4721, 4723, 4724, 4728, 4735, 4736, 4739, 4741, 4759, 4765, 4766, 4768, 4771, 4774, 4775, 4777, 4779, 4787, 4788, 4789, 4805, 4809, 4810, 4821, 4834, 4841, 4850, 4851, 4855, 4859, 4869, 4876, 4877, 4878, 4883, 4898, 4900, 4907, 4908, 4930, 4936, 4941, 4950, 4960, 4963, 4968, 4969, 4978, 4998, 4999, 5003, 5006, 5010, 5021, 5033, 5042, 5055, 5056, 5063, 5067, 5070, 5071, 5079, 5087, 5093, 5094, 5098, 5121, 5126, 5130, 5131, 5147, 5151, 5154, 5155, 5160, 5163, 5171, 5175, 5204, 5206, 5207, 5208, 5209, 5210, 5220, 5228, 5231, 5237, 5238, 5249, 5253, 5257, 5258, 5265, 5294, 5295, 5296, 5301, 5304, 5309, 5321, 5324, 5326, 5329, 5332, 5337, 5343, 5346, 5347, 5363, 5368, 5373, 5376, 5377, 5380, 5381, 5390, 5396, 5399, 5405, 5406, 5410, 5415, 5429, 5433, 5440, 5441, 5454, 5460, 5467, 5468, 5470, 5472, 5473, 5476, 5482, 5484, 5485, 5486, 5487, 5488, 5505, 5507, 5538, 5561, 5562, 5577, 5582, 5593, 5605, 5617, 5647, 5660, 5661, 5666, 5673, 5674, 5675, 5677, 5679, 5689, 5714, 5715, 5731, 5736, 5743, 5744, 5745, 5747, 5762, 5765, 5768, 5769, 5770, 5772, 5775, 5810, 5811, 5813, 5828, 5855, 5869, 5878, 5902, 5906, 5914, 5915, 5916, 5933, 5940, 5957, 5968, 5974, 5980, 5985, 5993, 6029, 6032, 6062, 6074, 6077, 6091, 6118, 6120, 6125, 6132, 6134, 6142, 6146, 6151, 6162, 6172, 6179, 6180, 6201, 6209, 6214, 6220, 6230, 6232, 6244, 6261, 6272, 6274, 6288, 6294, 6306, 6322, 6323, 6334, 6347, 6349, 6377, 6380, 6384, 6385, 6388, 6389, 6392, 6397, 6410, 6430, 6434, 6439, 6447, 6465, 6472, 6474, 6499, 6500, 6502, 6513, 6516, 6524, 6538, 6540, 6545, 6549, 6553, 6567, 6577, 6578, 6582, 6583, 6586, 6623, 6632, 6644, 6646, 6651, 6652, 6653, 6667, 6673, 6674, 6687, 6694, 6696, 6712, 6725, 6728, 6735, 6755, 6781, 6782, 6796, 6799, 6804, 6811, 6812, 6815, 6825, 6833, 6834, 6842, 6852, 6854, 6861, 6870, 6877, 6883, 6893, 6929, 6931, 6932, 6949, 6962, 7005, 7082, 7083, 7094, 7104, 7119, 7140, 7143, 7260, 7264, 7295, 7314, 7336, 7351, 7379, 7403, 7406, 7412, 7428, 7441, 7446, 7457, 7484, 7491, 7510, 7525, 7547, 7553, 7587, 7596, 7625, 7632, 7637, 7639, 7642, 7644, 7654, 7655, 7725, 7756, 7779, 7823, 7847, 7861, 7879, 7880, 7898, 7918, 7945, 7969, 7978, 7988, 7991, 8008, 8020, 8095, 8103, 8120, 8161, 8189, 8224, 8237, 8268, 8321, 8346, 8376, 8380, 8381, 8392, 8396, 8412, 8446, 8463, 8474, 8500, 8564, 8578, 8606, 8620, 8621, 8628, 8633, 8639, 8686, 8722, 8726, 8740, 8789, 8793, 8823, 8834, 8837, 8847, 8853, 8858, 8865, 8926, 8938, 8943, 8966, 8998, 9069, 9071, 9094, 9204, 9208, 9241, 9250, 9278, 9355, 9362, 9380, 9396, 9419, 9434, 9439, 9525, 9526, 9547, 9582, 9602, 9610, 9613, 9619, 9637, 9667, 9679, 9699, 9705, 9713, 9714, 9715, 9724, 9725, 9738, 9756, 9824, 9852, 9854, 9884, 9888, 9907, 9924, 9936, 9950, 9957, 9980, 9981, 9985, 9991, 9998, 10008, 10048, 10096, 10124, 10169, 10209, 10221, 10231, 10239, 10248, 10252, 10266, 10274, 10306, 10312, 10322, 10325, 10333, 10368, 10377, 10388, 10400, 10422, 10435, 10445, 10447, 10448, 10453, 10466, 10475, 10482, 10496, 10502, 10512, 10527, 10552, 10596, 10609, 10648, 10650, 10673, 10685, 10686, 10725, 10739, 10748, 10761, 10793, 10798, 10820, 10845, 10851, 10854, 10855, 10880, 10883, 10903, 10920, 10932, 10934, 10938, 10965, 10976, 10999, 11031, 11103, 11159, 11244, 11245, 11416, 11510, 11557, 11631, 11691, 11713, 11836, 11849, 11878, 11904, 11916, 11935, 11957, 11994, 11996, 12031, 12061, 12169, 12185, 12490, 12504, 12742, 12808, 12936, 12965, 13116, 13120, 13142, 13191, 13194, 13206, 13212, 13225, 13237, 13245, 13247, 13269, 13320, 13341, 13348, 13402, 13458, 13464, 13496, 13499, 13503, 13525, 13549, 13570, 13577, 13582, 13591, 13603, 13610, 13664, 13675, 13695, 13726, 13734, 13744, 13757, 13758, 13770, 13826, 13872, 13964, 13966, 13973, 14026, 14118, 14275, 14336, 14411, 14732, 14756, 14814, 14852, 15202, 15379, 15402, 15427, 15787, 15972, 15992, 16015, 16032, 16064, 16124, 16184, 16291, 16307, 16328, 16413, 16422, 16437, 16555, 16595, 16598, 16606, 16627, 16659, 16704, 16752, 16756, 16792, 16799, 16823, 16835, 16856, 16872, 16880, 16885, 16959, 17007, 17046, 17048, 17068, 17099, 17123, 17159, 17169, 17174, 17181, 17328, 17399, 17426, 17444, 17465, 17567, 17584, 17603, 17639, 17650, 17665, 17779, 17856, 17861, 17875, 17924, 18041, 18061, 18172, 18200, 18213, 18265, 18300, 18303, 18430, 18606, 18665, 18711, 18722, 18740, 18789, 18824, 18844, 18880, 18947, 19033, 19038, 19045, 19049, 19068, 19081, 19094, 19141, 19144, 19161, 19187, 19189, 19190, 19197, 19202, 19232, 19237, 19246, 19272, 19328, 19331, 19392, 19406, 19419, 19431, 19436, 19502, 19513, 19517, 19537, 19545, 19566, 19586, 19613, 19626, 19627, 19636, 19658, 19665, 19701, 19717, 19740, 19833, 19845, 19847, 19859, 19926, 19943, 19973, 19999]
MonthlyRate : [2094, 2097, 2104, 2112, 2122, 2125, 2137, 2227, 2243, 2253, 2261, 2288, 2302, 2323, 2326, 2338, 2354, 2373, 2396, 2437, 2447, 2493, 2539, 2560, 2561, 2613, 2671, 2689, 2690, 2706, 2721, 2725, 2739, 2755, 2819, 2823, 2845, 2851, 2890, 2900, 2912, 2939, 2967, 2975, 2993, 2997, 3010, 3020, 3031, 3032, 3064, 3072, 3088, 3119, 3129, 3140, 3142, 3156, 3157, 3164, 3173, 3193, 3208, 3297, 3300, 3334, 3335, 3339, 3356, 3372, 3376, 3395, 3415, 3423, 3425, 3427, 3445, 3449, 3458, 3465, 3487, 3498, 3525, 3536, 3549, 3567, 3622, 3666, 3687, 3692, 3698, 3708, 3735, 3787, 3809, 3810, 3811, 3835, 3840, 3854, 3872, 3909, 3921, 3956, 3974, 3987, 3995, 4009, 4022, 4050, 4051, 4060, 4077, 4156, 4161, 4167, 4185, 4187, 4204, 4223, 4235, 4244, 4257, 4258, 4267, 4279, 4284, 4297, 4303, 4306, 4317, 4344, 4345, 4349, 4381, 4386, 4488, 4510, 4544, 4567, 4585, 4605, 4609, 4652, 4658, 4668, 4673, 4681, 4732, 4759, 4761, 4809, 4814, 4821, 4824, 4892, 4905, 4910, 4933, 4944, 4956, 4973, 4981, 4992, 5013, 5033, 5041, 5044, 5050, 5083, 5099, 5100, 5118, 5141, 5151, 5174, 5182, 5197, 5200, 5207, 5220, 5224, 5228, 5242, 5268, 5288, 5323, 5335, 5340, 5348, 5355, 5388, 5404, 5411, 5431, 5456, 5494, 5518, 5530, 5531, 5543, 5549, 5561, 5569, 5586, 5594, 5596, 5598, 5602, 5615, 5626, 5628, 5630, 5640, 5652, 5678, 5696, 5711, 5718, 5747, 5771, 5829, 5843, 5855, 5860, 5868, 5869, 5915, 5949, 5970, 5972, 5982, 6004, 6009, 6020, 6039, 6054, 6060, 6069, 6073, 6076, 6110, 6148, 6152, 6153, 6161, 6163, 6179, 6194, 6208, 6217, 6219, 6225, 6227, 6233, 6297, 6311, 6319, 6393, 6409, 6420, 6462, 6499, 6527, 6582, 6595, 6599, 6615, 6645, 6670, 6672, 6689, 6698, 6705, 6729, 6759, 6762, 6770, 6812, 6842, 6865, 6881, 6889, 6896, 6927, 6950, 6961, 6975, 6984, 6986, 6992, 7003, 7018, 7060, 7100, 7102, 7103, 7108, 7122, 7129, 7143, 7152, 7160, 7172, 7181, 7192, 7246, 7259, 7288, 7298, 7324, 7331, 7346, 7360, 7389, 7419, 7428, 7439, 7501, 7505, 7507, 7508, 7530, 7551, 7568, 7621, 7636, 7653, 7660, 7677, 7679, 7693, 7703, 7713, 7739, 7744, 7747, 7770, 7790, 7791, 7815, 7824, 7858, 7909, 7914, 7950, 7973, 7975, 7999, 8007, 8018, 8039, 8040, 8045, 8053, 8059, 8191, 8192, 8202, 8213, 8232, 8269, 8277, 8302, 8306, 8318, 8319, 8346, 8386, 8392, 8416, 8423, 8429, 8450, 8456, 8489, 8504, 8509, 8532, 8544, 8552, 8556, 8571, 8635, 8658, 8733, 8751, 8758, 8770, 8787, 8800, 8828, 8841, 8842, 8847, 8861, 8863, 8870, 8891, 8916, 8931, 8935, 8952, 8978, 8984, 8989, 9051, 9060, 9068, 9075, 9096, 9100, 9125, 9128, 9129, 9148, 9150, 9192, 9238, 9241, 9250, 9255, 9256, 9260, 9262, 9277, 9278, 9282, 9314, 9358, 9364, 9369, 9396, 9489, 9490, 9518, 9528, 9541, 9558, 9571, 9606, 9647, 9655, 9659, 9679, 9687, 9696, 9697, 9724, 9731, 9732, 9752, 9755, 9769, 9834, 9867, 9873, 9931, 9945, 9946, 9947, 9953, 9961, 9964, 9973, 9977, 9983, 10007, 10022, 10034, 10036, 10056, 10074, 10077, 10084, 10092, 10110, 10138, 10195, 10205, 10224, 10225, 10227, 10228, 10261, 10268, 10293, 10302, 10310, 10322, 10332, 10333, 10339, 10410, 10414, 10415, 10425, 10436, 10494, 10503, 10515, 10531, 10554, 10557, 10558, 10589, 10642, 10675, 10697, 10732, 10735, 10748, 10778, 10781, 10826, 10842, 10846, 10849, 10877, 10893, 10901, 10910, 10919, 10942, 10950, 11005, 11012, 11031, 11038, 11092, 11133, 11135, 11141, 11148, 11162, 11179, 11189, 11262, 11275, 11288, 11309, 11314, 11354, 11373, 11380, 11411, 11439, 11473, 11479, 11512, 11533, 11535, 11539, 11563, 11585, 11591, 11652, 11677, 11693, 11737, 11740, 11757, 11761, 11781, 11806, 11825, 11827, 11864, 11866, 11868, 11873, 11879, 11882, 11912, 11924, 11925, 11929, 11934, 11983, 11992, 12023, 12066, 12069, 12086, 12090, 12102, 12106, 12124, 12127, 12145, 12147, 12154, 12227, 12241, 12250, 12253, 12278, 12287, 12288, 12290, 12291, 12313, 12315, 12355, 12368, 12388, 12392, 12414, 12421, 12430, 12449, 12477, 12482, 12530, 12549, 12682, 12695, 12719, 12740, 12761, 12826, 12828, 12832, 12853, 12858, 12862, 12888, 12916, 12930, 12932, 12947, 12982, 12992, 13008, 13022, 13035, 13072, 13084, 13119, 13137, 13192, 13243, 13248, 13251, 13257, 13273, 13301, 13305, 13335, 13339, 13352, 13364, 13384, 13401, 13402, 13421, 13422, 13430, 13436, 13492, 13493, 13494, 13514, 13523, 13535, 13547, 13551, 13554, 13556, 13583, 13586, 13588, 13596, 13624, 13637, 13672, 13684, 13693, 13738, 13755, 13782, 13829, 13848, 13871, 13888, 13934, 13938, 13939, 13943, 13953, 13970, 13982, 13983, 14004, 14011, 14034, 14039, 14074, 14075, 14115, 14120, 14168, 14180, 14199, 14218, 14222, 14229, 14242, 14255, 14284, 14293, 14295, 14363, 14369, 14377, 14382, 14394, 14399, 14408, 14460, 14470, 14506, 14511, 14561, 14590, 14618, 14630, 14669, 14674, 14720, 14753, 14776, 14810, 14811, 14814, 14842, 14862, 14864, 14871, 14908, 14922, 14935, 14947, 14961, 14977, 15000, 15053, 15062, 15067, 15146, 15170, 15174, 15178, 15182, 15211, 15232, 15238, 15276, 15302, 15318, 15322, 15332, 15346, 15395, 15397, 15411, 15417, 15428, 15434, 15471, 15480, 15497, 15530, 15587, 15589, 15596, 15624, 15669, 15678, 15682, 15696, 15701, 15717, 15736, 15747, 15748, 15813, 15815, 15830, 15834, 15850, 15869, 15881, 15891, 15896, 15901, 15919, 15963, 15970, 15972, 15975, 15986, 15998, 15999, 16002, 16019, 16031, 16044, 16047, 16090, 16092, 16102, 16117, 16130, 16143, 16154, 16177, 16192, 16193, 16213, 16225, 16280, 16290, 16292, 16321, 16340, 16346, 16374, 16375, 16376, 16379, 16392, 16439, 16458, 16479, 16490, 16495, 16523, 16530, 16542, 16571, 16577, 16586, 16612, 16616, 16620, 16632, 16642, 16673, 16701, 16734, 16822, 16840, 16873, 16885, 16900, 16901, 16928, 16985, 16998, 17000, 17001, 17011, 17053, 17056, 17071, 17078, 17089, 17102, 17119, 17171, 17181, 17198, 17205, 17218, 17231, 17235, 17241, 17251, 17258, 17285, 17312, 17323, 17334, 17360, 17363, 17369, 17381, 17433, 17434, 17456, 17477, 17485, 17491, 17519, 17536, 17544, 17588, 17616, 17624, 17654, 17663, 17674, 17689, 17725, 17736, 17747, 17759, 17778, 17783, 17799, 17802, 17808, 17810, 17822, 17852, 17872, 17881, 17940, 17967, 17970, 17997, 18016, 18024, 18079, 18089, 18092, 18103, 18115, 18154, 18168, 18203, 18235, 18256, 18264, 18275, 18300, 18384, 18385, 18398, 18410, 18420, 18437, 18500, 18575, 18597, 18611, 18624, 18625, 18640, 18659, 18685, 18686, 18697, 18698, 18706, 18725, 18767, 18775, 18779, 18783, 18787, 18798, 18830, 18863, 18869, 18899, 18938, 18959, 18991, 19002, 19028, 19100, 19106, 19121, 19124, 19146, 19170, 19188, 19191, 19196, 19225, 19239, 19246, 19255, 19271, 19281, 19293, 19294, 19299, 19305, 19332, 19345, 19368, 19373, 19383, 19384, 19394, 19461, 19479, 19494, 19519, 19555, 19558, 19562, 19566, 19573, 19588, 19609, 19627, 19630, 19655, 19658, 19665, 19682, 19711, 19715, 19719, 19730, 19737, 19757, 19760, 19764, 19783, 19788, 19805, 19826, 19863, 19877, 19899, 19905, 19911, 19920, 19921, 19944, 19948, 19982, 19989, 20002, 20003, 20006, 20100, 20115, 20156, 20161, 20165, 20206, 20232, 20234, 20251, 20260, 20284, 20293, 20308, 20317, 20328, 20335, 20338, 20364, 20366, 20392, 20420, 20431, 20439, 20445, 20460, 20462, 20467, 20471, 20489, 20490, 20497, 20520, 20586, 20619, 20623, 20652, 20682, 20689, 20715, 20739, 20750, 20763, 20794, 20898, 20925, 20933, 20938, 20943, 20948, 20978, 20989, 20990, 21016, 21026, 21029, 21030, 21057, 21072, 21075, 21081, 21082, 21086, 21123, 21141, 21143, 21146, 21158, 21173, 21195, 21196, 21199, 21203, 21214, 21221, 21222, 21293, 21378, 21412, 21436, 21437, 21445, 21447, 21457, 21495, 21509, 21519, 21526, 21530, 21534, 21602, 21624, 21630, 21632, 21643, 21653, 21698, 21703, 21708, 21728, 21731, 21777, 21782, 21813, 21816, 21821, 21829, 21831, 21833, 21922, 21923, 21972, 21981, 22002, 22021, 22049, 22052, 22061, 22074, 22087, 22088, 22098, 22102, 22107, 22128, 22149, 22154, 22162, 22174, 22217, 22245, 22262, 22266, 22308, 22310, 22376, 22384, 22422, 22455, 22456, 22474, 22477, 22478, 22482, 22490, 22495, 22534, 22539, 22553, 22573, 22577, 22578, 22589, 22604, 22645, 22650, 22653, 22656, 22670, 22673, 22710, 22722, 22789, 22792, 22794, 22807, 22808, 22812, 22822, 22825, 22845, 22887, 22908, 22914, 22929, 22930, 22949, 22952, 22955, 22957, 22967, 22977, 22984, 23016, 23037, 23060, 23070, 23099, 23159, 23163, 23177, 23213, 23231, 23238, 23258, 23281, 23288, 23293, 23300, 23333, 23343, 23352, 23361, 23364, 23371, 23384, 23398, 23402, 23413, 23428, 23447, 23452, 23457, 23474, 23490, 23522, 23537, 23553, 23577, 23631, 23648, 23679, 23683, 23687, 23726, 23737, 23757, 23772, 23779, 23785, 23793, 23814, 23826, 23844, 23848, 23866, 23888, 23910, 23914, 23965, 23978, 24001, 24008, 24017, 24032, 24052, 24097, 24117, 24118, 24152, 24162, 24164, 24200, 24208, 24223, 24232, 24252, 24301, 24375, 24406, 24409, 24439, 24440, 24442, 24444, 24447, 24450, 24456, 24483, 24525, 24532, 24539, 24558, 24594, 24608, 24609, 24619, 24624, 24666, 24668, 24737, 24785, 24788, 24793, 24795, 24812, 24835, 24852, 24907, 24920, 24941, 24978, 25043, 25063, 25098, 25103, 25150, 25166, 25174, 25178, 25198, 25233, 25258, 25265, 25275, 25291, 25308, 25326, 25348, 25353, 25388, 25412, 25422, 25440, 25470, 25479, 25518, 25527, 25549, 25592, 25594, 25605, 25657, 25681, 25713, 25725, 25751, 25755, 25761, 25796, 25800, 25811, 25812, 25846, 25927, 25949, 25952, 25995, 26009, 26062, 26075, 26076, 26085, 26092, 26124, 26176, 26186, 26204, 26227, 26236, 26250, 26278, 26283, 26285, 26308, 26312, 26314, 26342, 26362, 26376, 26427, 26458, 26493, 26496, 26507, 26537, 26542, 26551, 26582, 26589, 26619, 26703, 26707, 26767, 26820, 26841, 26849, 26862, 26894, 26897, 26914, 26933, 26956, 26959, 26968, 26997, 26999]
NumCompaniesWorked : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Over18 : ['Y']
OverTime : ['No', 'Yes']
PercentSalaryHike : [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
PerformanceRating : [3, 4]
RelationshipSatisfaction : [1, 2, 3, 4]
StandardHours : [80]
StockOptionLevel : [0, 1, 2, 3]
TotalWorkingYears : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 40]
TrainingTimesLastYear : [0, 1, 2, 3, 4, 5, 6]
WorkLifeBalance : [1, 2, 3, 4]
YearsAtCompany : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 29, 30, 31, 32, 33, 34, 36, 37, 40]
YearsInCurrentRole : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
YearsSinceLastPromotion : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
YearsWithCurrManager : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]

* As we can see “Over18”, “Standard Hours” and “Employee Count” contain the same value for each observation, which we do not need them in visualizing the dataset

2.Data Preparation

2.1 Data Cleaning

find missing data, remove data that will not assist with the visualization in analysis processing

In [9]:
df.isnull().sum()
Out[9]:
Age                         0
Attrition                   0
BusinessTravel              0
DailyRate                   0
Department                  0
DistanceFromHome            0
Education                   0
EducationField              0
EmployeeCount               0
EmployeeNumber              0
EnvironmentSatisfaction     0
Gender                      0
HourlyRate                  0
JobInvolvement              0
JobLevel                    0
JobRole                     0
JobSatisfaction             0
MaritalStatus               0
MonthlyIncome               0
MonthlyRate                 0
NumCompaniesWorked          0
Over18                      0
OverTime                    0
PercentSalaryHike           0
PerformanceRating           0
RelationshipSatisfaction    0
StandardHours               0
StockOptionLevel            0
TotalWorkingYears           0
TrainingTimesLastYear       0
WorkLifeBalance             0
YearsAtCompany              0
YearsInCurrentRole          0
YearsSinceLastPromotion     0
YearsWithCurrManager        0
dtype: int64
In [10]:
df.count()
Out[10]:
Age                         1470
Attrition                   1470
BusinessTravel              1470
DailyRate                   1470
Department                  1470
DistanceFromHome            1470
Education                   1470
EducationField              1470
EmployeeCount               1470
EmployeeNumber              1470
EnvironmentSatisfaction     1470
Gender                      1470
HourlyRate                  1470
JobInvolvement              1470
JobLevel                    1470
JobRole                     1470
JobSatisfaction             1470
MaritalStatus               1470
MonthlyIncome               1470
MonthlyRate                 1470
NumCompaniesWorked          1470
Over18                      1470
OverTime                    1470
PercentSalaryHike           1470
PerformanceRating           1470
RelationshipSatisfaction    1470
StandardHours               1470
StockOptionLevel            1470
TotalWorkingYears           1470
TrainingTimesLastYear       1470
WorkLifeBalance             1470
YearsAtCompany              1470
YearsInCurrentRole          1470
YearsSinceLastPromotion     1470
YearsWithCurrManager        1470
dtype: int64
In [11]:
df.isnull().sum().any()
Out[11]:
False

* This result shows if we have any missing values we used different codes. And as we can see, there are no missing values. Otherwise, we would have done some techniques, like dropping columns or rows, doing a replacement of mising values by the mean, backward, or frontward values.

2.2 Remove unsuported columns

In [12]:
#drop unwanted columns

df = df.drop(['Over18','StandardHours','EmployeeCount'], axis=1)
In [13]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1470 entries, 0 to 1469
Data columns (total 32 columns):
Age                         1470 non-null int64
Attrition                   1470 non-null object
BusinessTravel              1470 non-null object
DailyRate                   1470 non-null int64
Department                  1470 non-null object
DistanceFromHome            1470 non-null int64
Education                   1470 non-null int64
EducationField              1470 non-null object
EmployeeNumber              1470 non-null int64
EnvironmentSatisfaction     1470 non-null int64
Gender                      1470 non-null object
HourlyRate                  1470 non-null int64
JobInvolvement              1470 non-null int64
JobLevel                    1470 non-null int64
JobRole                     1470 non-null object
JobSatisfaction             1470 non-null int64
MaritalStatus               1470 non-null object
MonthlyIncome               1470 non-null int64
MonthlyRate                 1470 non-null int64
NumCompaniesWorked          1470 non-null int64
OverTime                    1470 non-null object
PercentSalaryHike           1470 non-null int64
PerformanceRating           1470 non-null int64
RelationshipSatisfaction    1470 non-null int64
StockOptionLevel            1470 non-null int64
TotalWorkingYears           1470 non-null int64
TrainingTimesLastYear       1470 non-null int64
WorkLifeBalance             1470 non-null int64
YearsAtCompany              1470 non-null int64
YearsInCurrentRole          1470 non-null int64
YearsSinceLastPromotion     1470 non-null int64
YearsWithCurrManager        1470 non-null int64
dtypes: int64(24), object(8)
memory usage: 367.6+ KB

* Since “Over18”, “Standard Hours” and “Employee Count” has a static varible, we remove them to improve the speed of processing the dataframe.

2.3 Mapping data

In [14]:
df.dtypes
Out[14]:
Age                          int64
Attrition                   object
BusinessTravel              object
DailyRate                    int64
Department                  object
DistanceFromHome             int64
Education                    int64
EducationField              object
EmployeeNumber               int64
EnvironmentSatisfaction      int64
Gender                      object
HourlyRate                   int64
JobInvolvement               int64
JobLevel                     int64
JobRole                     object
JobSatisfaction              int64
MaritalStatus               object
MonthlyIncome                int64
MonthlyRate                  int64
NumCompaniesWorked           int64
OverTime                    object
PercentSalaryHike            int64
PerformanceRating            int64
RelationshipSatisfaction     int64
StockOptionLevel             int64
TotalWorkingYears            int64
TrainingTimesLastYear        int64
WorkLifeBalance              int64
YearsAtCompany               int64
YearsInCurrentRole           int64
YearsSinceLastPromotion      int64
YearsWithCurrManager         int64
dtype: object
In [15]:
df['Attrition'].unique()
Out[15]:
array(['Yes', 'No'], dtype=object)
In [16]:
#Education map 
Attrition_map = {"Yes" : 1, "No": 0}
print(Attrition_map)
df['Attrition']=df['Attrition'].map(Attrition_map)

Education_map = {1:"Below College", 2 :'College' ,3 : 'Bachelor' , 4 :'Master', 5 :'Doctor'}
df['Education'] = df['Education'].map(Education_map)



EnvironmentSatisfaction_map = {1 :"Low", 2:"Medium", 3:"High", 4:"Very High"}
df["EnvironmentSatisfaction"] = df["EnvironmentSatisfaction"].map(EnvironmentSatisfaction_map)

JobInvolvement_map = {1 :"Low", 2:"Medium", 3:"High", 4:"Very High"}
df["JobInvolvement"] = df["JobInvolvement"].map(JobInvolvement_map)

JobSatisfaction_map = {1 :"Low", 2:"Medium", 3:"High", 4:"Very High"}
df["JobSatisfaction"] = df["JobSatisfaction"].map(JobSatisfaction_map)

PerformanceRating_map = {1 :"Low", 2:"Medium", 3:"High", 4:"Outstanding"}
df["PerformanceRating"] = df["PerformanceRating"].map(PerformanceRating_map)

RelationshipSatisfaction_map = {1 :"Low", 2:"Medium", 3:"High", 4:"Outstanding"}
df["RelationshipSatisfaction"] = df["RelationshipSatisfaction"].map(RelationshipSatisfaction_map)

WorkLifeBalance_map = {1 :"Low", 2:"Medium", 3:"High", 4:"Outstanding"}
df["WorkLifeBalance"] = df["WorkLifeBalance"].map(WorkLifeBalance_map)
{'Yes': 1, 'No': 0}
In [17]:
df['Attrition'].unique()
Out[17]:
array([1, 0], dtype=int64)

2.4 Grouping / Binning Ages

In [18]:
df["Age"].describe()
Out[18]:
count    1470.000000
mean       36.923810
std         9.135373
min        18.000000
25%        30.000000
50%        36.000000
75%        43.000000
max        60.000000
Name: Age, dtype: float64
In [19]:
age_labels = ['18-24', '25-30', '31-35', '36-40', '41-45', '46-50', '51-55', '56-60']
df['age_group'] = pd.cut(df.Age, range(18, 61, 5), right=False, labels=age_labels)
In [20]:
df.head(3)
Out[20]:
Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeNumber EnvironmentSatisfaction ... RelationshipSatisfaction StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager age_group
0 41 1 Travel_Rarely 1102 Sales 1 College Life Sciences 1 Medium ... Low 0 8 0 Low 6 4 0 5 41-45
1 49 0 Travel_Frequently 279 Research & Development 8 Below College Life Sciences 2 High ... Outstanding 1 10 3 High 10 7 1 7 51-55
2 37 1 Travel_Rarely 1373 Research & Development 2 College Other 4 Very High ... Medium 0 7 3 High 0 0 0 0 36-40

3 rows × 33 columns

3. Exploring statistics on the dataset

3.1 Descriptive statstic

In [21]:
df.describe()
Out[21]:
Age Attrition DailyRate DistanceFromHome EmployeeNumber HourlyRate JobLevel MonthlyIncome MonthlyRate NumCompaniesWorked PercentSalaryHike StockOptionLevel TotalWorkingYears TrainingTimesLastYear YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager
count 1470.000000 1470.000000 1470.000000 1470.000000 1470.000000 1470.000000 1470.000000 1470.000000 1470.000000 1470.000000 1470.000000 1470.000000 1470.000000 1470.000000 1470.000000 1470.000000 1470.000000 1470.000000
mean 36.923810 0.161224 802.485714 9.192517 1024.865306 65.891156 2.063946 6502.931293 14313.103401 2.693197 15.209524 0.793878 11.279592 2.799320 7.008163 4.229252 2.187755 4.123129
std 9.135373 0.367863 403.509100 8.106864 602.024335 20.329428 1.106940 4707.956783 7117.786044 2.498009 3.659938 0.852077 7.780782 1.289271 6.126525 3.623137 3.222430 3.568136
min 18.000000 0.000000 102.000000 1.000000 1.000000 30.000000 1.000000 1009.000000 2094.000000 0.000000 11.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 30.000000 0.000000 465.000000 2.000000 491.250000 48.000000 1.000000 2911.000000 8047.000000 1.000000 12.000000 0.000000 6.000000 2.000000 3.000000 2.000000 0.000000 2.000000
50% 36.000000 0.000000 802.000000 7.000000 1020.500000 66.000000 2.000000 4919.000000 14235.500000 2.000000 14.000000 1.000000 10.000000 3.000000 5.000000 3.000000 1.000000 3.000000
75% 43.000000 0.000000 1157.000000 14.000000 1555.750000 83.750000 3.000000 8379.000000 20461.500000 4.000000 18.000000 1.000000 15.000000 3.000000 9.000000 7.000000 3.000000 7.000000
max 60.000000 1.000000 1499.000000 29.000000 2068.000000 100.000000 5.000000 19999.000000 26999.000000 9.000000 25.000000 3.000000 40.000000 6.000000 40.000000 18.000000 15.000000 17.000000

3.2 Visualizing these statistics using boxplots

In [22]:
plt.rcParams["figure.figsize"] = (20,7)
df.boxplot()
plt.xticks(rotation=90)
Out[22]:
(array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
        18]), <a list of 18 Text xticklabel objects>)
In [23]:
df["Attrition"].replace("Yes", 1, inplace = True)
df["Attrition"].replace("No", 0, inplace = True)

df
Out[23]:
Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeNumber EnvironmentSatisfaction ... RelationshipSatisfaction StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager age_group
0 41 1 Travel_Rarely 1102 Sales 1 College Life Sciences 1 Medium ... Low 0 8 0 Low 6 4 0 5 41-45
1 49 0 Travel_Frequently 279 Research & Development 8 Below College Life Sciences 2 High ... Outstanding 1 10 3 High 10 7 1 7 51-55
2 37 1 Travel_Rarely 1373 Research & Development 2 College Other 4 Very High ... Medium 0 7 3 High 0 0 0 0 36-40
3 33 0 Travel_Frequently 1392 Research & Development 3 Master Life Sciences 5 Very High ... High 0 8 3 High 8 7 3 0 36-40
4 27 0 Travel_Rarely 591 Research & Development 2 Below College Medical 7 Low ... Outstanding 1 6 3 High 2 2 2 2 25-30
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1465 36 0 Travel_Frequently 884 Research & Development 23 College Medical 2061 High ... High 1 17 3 High 5 2 0 3 36-40
1466 39 0 Travel_Rarely 613 Research & Development 6 Below College Medical 2062 Very High ... Low 1 9 5 High 7 7 1 7 41-45
1467 27 0 Travel_Rarely 155 Research & Development 4 Bachelor Life Sciences 2064 Medium ... Medium 1 6 0 High 6 2 0 3 25-30
1468 49 0 Travel_Frequently 1023 Sales 2 Bachelor Medical 2065 Very High ... Outstanding 0 17 3 Medium 9 6 0 8 51-55
1469 34 0 Travel_Rarely 628 Research & Development 8 Bachelor Medical 2068 Medium ... Low 0 6 3 Outstanding 4 3 1 2 36-40

1470 rows × 33 columns

In [24]:
sns.boxplot(x=df['Education'],y=df['Age'],data=df, hue=df["Attrition"])
Out[24]:
<matplotlib.axes._subplots.AxesSubplot at 0x21f5cbf69c8>

* It can be observed that the value ranges of columns (MonthlyIncome, MonthlyRate, EmployeeNumber, DailyRate) are significantly higher than the remaining numeric columns. This can be corrected using normalization.

Visualizing the value distribution for each numeric column in the dataset

In [25]:
df.hist(bins=50,figsize=(20,16))
Out[25]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x0000021F5CBF5048>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000021F5CA05688>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000021F5CA3DB08>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000021F5CA77748>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000021F5CAB1388>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000021F5CAE3F48>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000021F5CB22308>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000021F5CB55D48>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000021F5CB60FC8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000021F5CEA01C8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000021F5CF050C8>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000021F5CF3D148>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000021F5CF77248>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000021F5CFB0388>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000021F5CFE9488>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000021F5D01F588>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000021F5D058648>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000021F5D091748>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000021F5D0CA848>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000021F5D100A48>]],
      dtype=object)

4. Visualizing the value distributions for the individual variable and exploring its statistic

4.1 Atrithion Rate

In [26]:
df.groupby(["Attrition"]).count()
Out[26]:
Age BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeNumber EnvironmentSatisfaction Gender ... RelationshipSatisfaction StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager age_group
Attrition
0 1233 1233 1233 1233 1233 1233 1233 1233 1233 1233 ... 1233 1233 1233 1233 1233 1233 1233 1233 1233 1209
1 237 237 237 237 237 237 237 237 237 237 ... 237 237 237 237 237 237 237 237 237 232

2 rows × 32 columns

In [27]:
df["Attrition"].value_counts()
Out[27]:
0    1233
1     237
Name: Attrition, dtype: int64
In [28]:
emp_attrition = df[df["Attrition"] == 1]
emp_attrition = emp_attrition["Attrition"].count()
print ("The total number of employee who suffer from attrition are :" , emp_attrition)
The total number of employee who suffer from attrition are : 237
In [29]:
emp_no_attrition = df[df["Attrition"] == 0]
emp_no_attrition = emp_no_attrition["Attrition"].count()
print ("The total number of employee who is not suffer from attritionis :" , emp_no_attrition)
The total number of employee who is not suffer from attritionis : 1233
In [30]:
# Show the percentage of each unique class label in the target Attrition column
df['Attrition'].value_counts()/len(df['Attrition'])*100
Out[30]:
0    83.877551
1    16.122449
Name: Attrition, dtype: float64
In [31]:
#Visualize the result 
plt.rcParams["figure.figsize"] = (7,7)
ax = sns.countplot(x='Attrition', data=df)
for p in ax.patches:
    ax.annotate('{}'.format(p.get_height()), (p.get_x(), p.get_height()+1))

plt.title("Employee Attrition")
plt.show()

* The previous percentages show that almost 84% of the employees included in the dataset did not suffer from attrition. Also, it can be observed that the data is imbalanced between the two class labels (83.8% for 'No' and 16.1% for 'Yes') of the 'Attrition' target column. Thus, there is a need to balance the sampling ratio during the training process of a classifier algorithm.

In [32]:
#using interactive graph
from plotly.offline import init_notebook_mode,iplot
import plotly.graph_objs as go
groups = df["Attrition"]
amount = df["Attrition"].value_counts()
colors = ['red', 'blue']
trace = go.Pie(labels=["No","Yes"], values=amount,
hoverinfo='label+percent', textinfo='value',
textfont=dict(size=25),
marker=dict(colors=colors,
line=dict(color='#000000', width=3)))

# print ("it should be the obeset??")

iplot([trace])

4.2 Finding correlation between variables

In [33]:
data_correlation = df.corr()
plt.rcParams["figure.figsize"] = [20,10]
sns.heatmap(data_correlation,xticklabels=data_correlation.columns,yticklabels=data_correlation.columns,annot=True,cmap="Blues")


print("How can we justify the numbers with boxs?")
How can we justify the numbers with boxs?

The correlation analysi shows interesting findings First, there is a high positive correlation between the “TotalWorkingYears” column and the “JobLevel” and “MonthlyIncome”, which reflects a sort of fairness in promoting and paying people in the company based on their experience level. Second, there was a high positive correlation between “PerformanceRating” and “PercentSalaryHike” columns, which again confirms that the increase in salary is based on the increase in the performance level. Third, the “JobSatisfaction” column does not have any correlation with the reminder of the numeric columns, which is somehow unexpected as it would be reasonable to have it increased with the increase in “MonthlyIncome” or “JobLevel” columns.

Normalizing the dataset

before we go in deap in visualize the dataset, it is better to normalize it to avoid differnt variance

In [34]:
from sklearn.preprocessing import StandardScaler

standard=df.copy()
val=standard.select_dtypes("int64")

col_names=list(val.columns)

features =  val[col_names]
scaler = StandardScaler().fit(features.values)
features = scaler.transform(features.values)

standard[col_names] = features
standard
Out[34]:
Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeNumber EnvironmentSatisfaction ... RelationshipSatisfaction StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager age_group
0 0.446350 2.280906 Travel_Rarely 0.742527 Sales -1.010909 College Life Sciences -1.701283 Medium ... Low -0.932014 -0.421642 -2.171982 Low -0.164613 -0.063296 -0.679146 0.245834 41-45
1 1.322365 -0.438422 Travel_Frequently -1.297775 Research & Development -0.147150 Below College Life Sciences -1.699621 High ... Outstanding 0.241988 -0.164511 0.155707 High 0.488508 0.764998 -0.368715 0.806541 51-55
2 0.008343 2.280906 Travel_Rarely 1.414363 Research & Development -0.887515 College Other -1.696298 Very High ... Medium -0.932014 -0.550208 0.155707 High -1.144294 -1.167687 -0.679146 -1.155935 36-40
3 -0.429664 -0.438422 Travel_Frequently 1.461466 Research & Development -0.764121 Master Life Sciences -1.694636 Very High ... High -0.932014 -0.421642 0.155707 High 0.161947 0.764998 0.252146 -1.155935 36-40
4 -1.086676 -0.438422 Travel_Rarely -0.524295 Research & Development -0.887515 Below College Medical -1.691313 Low ... Outstanding 0.241988 -0.678774 0.155707 High -0.817734 -0.615492 -0.058285 -0.595227 25-30
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1465 -0.101159 -0.438422 Travel_Frequently 0.202082 Research & Development 1.703764 College Medical 1.721670 High ... High 0.241988 0.735447 0.155707 High -0.327893 -0.615492 -0.679146 -0.314873 36-40
1466 0.227347 -0.438422 Travel_Rarely -0.469754 Research & Development -0.393938 Below College Medical 1.723332 Very High ... Low 0.241988 -0.293077 1.707500 High -0.001333 0.764998 -0.368715 0.806541 41-45
1467 -1.086676 -0.438422 Travel_Rarely -1.605183 Research & Development -0.640727 Bachelor Life Sciences 1.726655 Medium ... Medium 0.241988 -0.678774 -2.171982 High -0.164613 -0.615492 -0.679146 -0.314873 25-30
1468 1.322365 -0.438422 Travel_Frequently 0.546677 Sales -0.887515 Bachelor Medical 1.728317 Very High ... Outstanding -0.932014 0.735447 0.155707 Medium 0.325228 0.488900 -0.679146 1.086895 51-55
1469 -0.320163 -0.438422 Travel_Rarely -0.432568 Research & Development -0.147150 Bachelor Medical 1.733302 Medium ... Low -0.932014 -0.678774 0.155707 Outstanding -0.491174 -0.339394 -0.368715 -0.595227 36-40

1470 rows × 33 columns

4.2 Relationship of Age Variable with Attrition

In [35]:
df.groupby(['age_group']).size().plot(kind='bar',stacked=True)
plt.title("Distribution of Age Groups",fontsize=14)
plt.ylabel('Count')
plt.xlabel('Age Group');
In [36]:
sns.distplot(df["Age"])
Out[36]:
<matplotlib.axes._subplots.AxesSubplot at 0x21f605eb588>
In [37]:
youngest = df['Age'].min()
print(" The youngest employee in the company was in age : ", youngest)
 The youngest employee in the company was in age :  18
In [38]:
oldest = df['Age'].max()
print(" The oldest employee in the company was in age  ", oldest)
 The oldest employee in the company was in age   60
In [39]:
#fining out who was the oldest employee
df.loc[oldest,:]
Out[39]:
Age                                             32
Attrition                                        0
BusinessTravel                       Travel_Rarely
DailyRate                                      427
Department                  Research & Development
DistanceFromHome                                 1
Education                                 Bachelor
EducationField                             Medical
EmployeeNumber                                  78
EnvironmentSatisfaction                        Low
Gender                                        Male
HourlyRate                                      33
JobInvolvement                                High
JobLevel                                         2
JobRole                     Manufacturing Director
JobSatisfaction                          Very High
MaritalStatus                              Married
MonthlyIncome                                 6162
MonthlyRate                                  10877
NumCompaniesWorked                               1
OverTime                                       Yes
PercentSalaryHike                               22
PerformanceRating                      Outstanding
RelationshipSatisfaction                    Medium
StockOptionLevel                                 1
TotalWorkingYears                                9
TrainingTimesLastYear                            3
WorkLifeBalance                               High
YearsAtCompany                                   9
YearsInCurrentRole                               8
YearsSinceLastPromotion                          7
YearsWithCurrManager                             8
age_group                                    31-35
Name: 60, dtype: object
In [40]:
#fining out who was the youngest employee
df[df['Age']==youngest]
Out[40]:
Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeNumber EnvironmentSatisfaction ... RelationshipSatisfaction StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager age_group
296 18 1 Travel_Rarely 230 Research & Development 3 Bachelor Life Sciences 405 High ... High 0 0 2 High 0 0 0 0 18-24
301 18 0 Travel_Rarely 812 Sales 10 Bachelor Medical 411 Very High ... Low 0 0 2 High 0 0 0 0 18-24
457 18 1 Travel_Frequently 1306 Sales 5 Bachelor Marketing 614 Medium ... Outstanding 0 0 3 High 0 0 0 0 18-24
727 18 0 Non-Travel 287 Research & Development 5 College Life Sciences 1012 Medium ... Outstanding 0 0 2 High 0 0 0 0 18-24
828 18 1 Non-Travel 247 Research & Development 8 Below College Medical 1156 High ... Outstanding 0 0 0 High 0 0 0 0 18-24
972 18 0 Non-Travel 1124 Research & Development 1 Bachelor Life Sciences 1368 Very High ... High 0 0 5 Outstanding 0 0 0 0 18-24
1153 18 1 Travel_Frequently 544 Sales 3 College Medical 1624 Medium ... High 0 0 2 Outstanding 0 0 0 0 18-24
1311 18 0 Non-Travel 1431 Research & Development 14 Bachelor Medical 1839 Medium ... High 0 0 4 Low 0 0 0 0 18-24

8 rows × 33 columns

As we can see from the result above, the oldest employee was in his 60 years old, and he shows not attrition, while the youngest employee was in his 18, and he is attrition.

In [41]:
positive_attrition_df = df.loc[df['Attrition'] == 1]
negative_attrition_df = df.loc[df['Attrition'] == 0]
In [42]:
negative_attrition_df.head()
Out[42]:
Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeNumber EnvironmentSatisfaction ... RelationshipSatisfaction StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager age_group
1 49 0 Travel_Frequently 279 Research & Development 8 Below College Life Sciences 2 High ... Outstanding 1 10 3 High 10 7 1 7 51-55
3 33 0 Travel_Frequently 1392 Research & Development 3 Master Life Sciences 5 Very High ... High 0 8 3 High 8 7 3 0 36-40
4 27 0 Travel_Rarely 591 Research & Development 2 Below College Medical 7 Low ... Outstanding 1 6 3 High 2 2 2 2 25-30
5 32 0 Travel_Frequently 1005 Research & Development 2 College Life Sciences 8 Very High ... High 0 8 2 Medium 7 7 3 6 31-35
6 59 0 Travel_Rarely 1324 Research & Development 3 Bachelor Medical 10 High ... Low 3 12 3 Medium 1 0 0 0 NaN

5 rows × 33 columns

In [43]:
positive_attrition_df.head()
Out[43]:
Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeNumber EnvironmentSatisfaction ... RelationshipSatisfaction StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager age_group
0 41 1 Travel_Rarely 1102 Sales 1 College Life Sciences 1 Medium ... Low 0 8 0 Low 6 4 0 5 41-45
2 37 1 Travel_Rarely 1373 Research & Development 2 College Other 4 Very High ... Medium 0 7 3 High 0 0 0 0 36-40
14 28 1 Travel_Rarely 103 Research & Development 24 Bachelor Life Sciences 19 High ... Medium 0 6 4 High 4 2 0 3 31-35
21 36 1 Travel_Rarely 1218 Sales 9 Master Life Sciences 27 High ... Medium 0 10 4 High 5 3 0 3 36-40
24 34 1 Travel_Rarely 699 Research & Development 6 Below College Medical 31 Medium ... High 0 8 2 High 4 2 1 3 36-40

5 rows × 33 columns

In [44]:
sns.distplot(negative_attrition_df['MonthlyIncome'], label='Negative attrition')
sns.distplot(positive_attrition_df['MonthlyIncome'], label='positive attrition')
plt.legend()
Out[44]:
<matplotlib.legend.Legend at 0x21f60f25708>
In [45]:
type(emp_attrition)
Out[45]:
numpy.int32
In [46]:
from plotly.offline import init_notebook_mode,iplot
import plotly.graph_objs as go

df= df.head(30)
trace1 = go.Bar(
# x = emp_attrition['Age'],
x = df['Age'],
y = df['Age'][df['Attrition']==1],
name= 'Yes')
trace2 = go.Bar(
# x = emp_no_attrition['Age'],
x = df['Age'],
y = df['Age'][df['Attrition']==0],
name= 'No')
data = [trace1, trace2]
layout = go.Layout(barmode='group')
fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='grouped-bar')
In [47]:
plt.hist(df['Age'][df["Attrition"]==1], bins= 80, histtype="bar")
plt.hist(df['Age'][df["Attrition"]==0], bins= 80, histtype="bar")
plt.legend("Age", loc='uper right')


plt.xlabel= ("Age")
plt.ylabel = ("Frequency")
plt.title('The distribution for the Age', fontsize = 18 )

plt.xticks(rotation=90)

plt.tight_layout()
plt.savefig('Age.png', dpi = 300)

plt.show()
In [48]:
from plotly.offline import init_notebook_mode,iplot
import plotly.graph_objs as go

df= df.head(30)
trace1 = go.Bar(
# x = emp_attrition['Age'],
x = df['YearsAtCompany'],
y = df['YearsAtCompany'][df['Attrition']==1],
name= 'Yes')
trace2 = go.Bar(
# x = emp_no_attrition['Age'],
x = df['YearsAtCompany'],
y = df['YearsAtCompany'][df['Attrition']==0],
name= 'No')
data = [trace1, trace2]
layout = go.Layout(barmode='group')
fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='grouped-bar')
In [49]:
job  = df['JobSatisfaction'].value_counts()
plt.figure(figsize=(10,5))
sns.barplot(job.index, job.values, alpha=0.8)
plt.title('Different types of travel')
plt.show()
In [50]:
from plotly.offline import init_notebook_mode,iplot
import plotly.graph_objs as go

df= df.head(30)
trace1 = go.Bar(
# x = emp_attrition['Age'],
x = df['YearsSinceLastPromotion'],
y = df['YearsSinceLastPromotion'][df['Attrition']==1],
name= 'Yes')
trace2 = go.Bar(
# x = emp_no_attrition['Age'],
x = df['YearsSinceLastPromotion'],
y = df['YearsSinceLastPromotion'][df['Attrition']==0],
name= 'No')
data = [trace1, trace2]
layout = go.Layout(barmode='group')
fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='grouped-bar')
In [51]:
business  = df['BusinessTravel'].value_counts()
plt.figure(figsize=(10,5))
sns.barplot(business.index, business.values, alpha=0.8)
plt.title('Different types of travel')
plt.show()
In [52]:
df
Out[52]:
Age Attrition BusinessTravel DailyRate Department DistanceFromHome Education EducationField EmployeeNumber EnvironmentSatisfaction ... RelationshipSatisfaction StockOptionLevel TotalWorkingYears TrainingTimesLastYear WorkLifeBalance YearsAtCompany YearsInCurrentRole YearsSinceLastPromotion YearsWithCurrManager age_group
0 41 1 Travel_Rarely 1102 Sales 1 College Life Sciences 1 Medium ... Low 0 8 0 Low 6 4 0 5 41-45
1 49 0 Travel_Frequently 279 Research & Development 8 Below College Life Sciences 2 High ... Outstanding 1 10 3 High 10 7 1 7 51-55
2 37 1 Travel_Rarely 1373 Research & Development 2 College Other 4 Very High ... Medium 0 7 3 High 0 0 0 0 36-40
3 33 0 Travel_Frequently 1392 Research & Development 3 Master Life Sciences 5 Very High ... High 0 8 3 High 8 7 3 0 36-40
4 27 0 Travel_Rarely 591 Research & Development 2 Below College Medical 7 Low ... Outstanding 1 6 3 High 2 2 2 2 25-30
5 32 0 Travel_Frequently 1005 Research & Development 2 College Life Sciences 8 Very High ... High 0 8 2 Medium 7 7 3 6 31-35
6 59 0 Travel_Rarely 1324 Research & Development 3 Bachelor Medical 10 High ... Low 3 12 3 Medium 1 0 0 0 NaN
7 30 0 Travel_Rarely 1358 Research & Development 24 Below College Life Sciences 11 Very High ... Medium 1 1 2 High 1 0 0 0 31-35
8 38 0 Travel_Frequently 216 Research & Development 23 Bachelor Life Sciences 12 Very High ... Medium 0 10 2 High 9 7 1 8 41-45
9 36 0 Travel_Rarely 1299 Research & Development 27 Bachelor Medical 13 High ... Medium 2 17 3 Medium 7 7 7 7 36-40
10 35 0 Travel_Rarely 809 Research & Development 16 Bachelor Medical 14 Low ... High 1 6 5 High 5 4 0 3 36-40
11 29 0 Travel_Rarely 153 Research & Development 15 College Life Sciences 15 Very High ... Outstanding 0 10 3 High 9 5 0 8 31-35
12 31 0 Travel_Rarely 670 Research & Development 26 Below College Life Sciences 16 Low ... Outstanding 1 5 1 Medium 5 2 4 3 31-35
13 34 0 Travel_Rarely 1346 Research & Development 19 College Medical 18 Medium ... High 1 3 2 High 2 2 1 2 36-40
14 28 1 Travel_Rarely 103 Research & Development 24 Bachelor Life Sciences 19 High ... Medium 0 6 4 High 4 2 0 3 31-35
15 29 0 Travel_Rarely 1389 Research & Development 21 Master Life Sciences 20 Medium ... High 1 10 1 High 10 9 8 8 31-35
16 32 0 Travel_Rarely 334 Research & Development 5 College Life Sciences 21 Low ... Outstanding 2 7 5 Medium 6 2 0 5 31-35
17 22 0 Non-Travel 1123 Research & Development 16 College Medical 22 Very High ... Medium 2 1 2 Medium 1 0 0 0 18-24
18 53 0 Travel_Rarely 1219 Sales 2 Master Life Sciences 23 Low ... High 0 31 3 High 25 8 3 7 56-60
19 38 0 Travel_Rarely 371 Research & Development 2 Bachelor Life Sciences 24 Very High ... High 0 6 3 High 3 2 1 2 41-45
20 24 0 Non-Travel 673 Research & Development 11 College Other 26 Low ... Outstanding 1 5 5 Medium 4 2 1 3 25-30
21 36 1 Travel_Rarely 1218 Sales 9 Master Life Sciences 27 High ... Medium 0 10 4 High 5 3 0 3 36-40
22 34 0 Travel_Rarely 419 Research & Development 7 Master Life Sciences 28 Low ... High 0 13 4 High 12 6 2 11 36-40
23 21 0 Travel_Rarely 391 Research & Development 15 College Life Sciences 30 High ... Outstanding 0 0 6 High 0 0 0 0 18-24
24 34 1 Travel_Rarely 699 Research & Development 6 Below College Medical 31 Medium ... High 0 8 2 High 4 2 1 3 36-40
25 53 0 Travel_Rarely 1282 Research & Development 5 Bachelor Other 32 High ... Outstanding 1 26 3 Medium 14 13 4 8 56-60
26 32 1 Travel_Frequently 1125 Research & Development 16 Below College Life Sciences 33 Medium ... Medium 0 10 5 High 10 2 6 7 31-35
27 42 0 Travel_Rarely 691 Sales 8 Master Marketing 35 High ... Outstanding 1 10 2 High 9 7 4 2 41-45
28 44 0 Travel_Rarely 477 Research & Development 7 Master Medical 36 Low ... Outstanding 1 24 4 High 22 6 5 17 46-50
29 46 0 Travel_Rarely 705 Sales 2 Master Marketing 38 Medium ... Outstanding 0 22 2 Medium 2 2 2 1 46-50

30 rows × 33 columns

In [53]:
!pip install -c plotly chart-studio
ERROR: Could not open requirements file: [Errno 2] No such file or directory: 'plotly'
In [54]:
x=['giraffes', 'orangutans', 'monkeys']
y=[12, 18, 29]
zoo=pd.DataFrame(x,columns=['animals'])
zoo['value']=y
zoo
Out[54]:
animals value
0 giraffes 12
1 orangutans 18
2 monkeys 29
In [55]:
from plotly.offline import init_notebook_mode,iplot
import plotly.graph_objs as go

df= df.head(30)
trace4 = go.Bar(
x = zoo['animals'],
y = zoo['value'],
name= 'ZOO')
data = [trace4]
layout = go.Layout(barmode='group')
fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='grouped-bar')

here i have worked on finding out the no.of animals that were present of a particular type. above i was showcased the data having two columns - animal name and the value_count of that spicies and with the help of these plots that is showcased

In [56]:
print (df.groupby(['age_group']).Attrition.mean())
age_group
18-24    0.000
25-30    0.000
31-35    0.250
36-40    0.375
41-45    0.250
46-50    0.000
51-55    0.000
56-60    0.000
Name: Attrition, dtype: float64

Here I have tried to find out the mean value of attrition for a particular age group. i.e for eg- 36-40 is the age group and 0.375 is the mean attrition means that the avg attrition value for the range is 0.375